Introduction

In this document I am collecting all the information and figures that are used for naming the clusters of the Sang scWBM experiment one.

Cluster naming workflow

  1. Using singleR to compare the cluster centroids in our analysis to reference datasets, to find the reference cell type with the highest correlation to our cell type. The datasets being compare to are: 830 microarray samples of pure mouse immune cells, generated by the Immunologic Genome Project (ImmGen, Aran et al., 2019); and 358 bulk RNA-seq samples of sorted cell populations that can be found at GEO (Benayoun et al., 2019)

  2. Using singleR to compare individual cells to the reference cell types (same as step 1)

  3. Using marker gene expression to either confirm or alter cluster labels generated from steps 1 and 2.

SingleR Cluster Naming

Above is a graphic from a package associated with SingleR. It is interesting but has to much going on.

The final score given to each cluster. The higher score between the two references for each cluster was the final label.

SingleR Cell Naming

##    0    1    2    3    4    5    6    7    8    9   10   11   12 
## 1000  828  813  774  541  538  451  351  218  194  191  151   71
## [1] 18

Lineage Specific Markers

Next I’m going to use lineage specific markers to manually look at the cell clusters. List was provided by Dr. Sang.

Granulocytes

## [1]  TRUE  TRUE  TRUE  TRUE FALSE

These markers are only consistently expressed in cluster 6. Some marker expression in clusters 4, 5 and 7.

B-cell

Jchain showed little to no expression in any cluster.

Expression in clusters 3 and 10, almost exclusively.

Megakaryocytes

Most expression was found in cluster 12, with strong Itga2b expression in 4 and some Pf4 expression in 8.

HSPC

## Warning in SingleExIPlot(type = type, data = data[, x, drop = FALSE], idents =
## idents, : All cells have the same value of Emcn.
## Warning in SingleExIPlot(type = type, data = data[, x, drop = FALSE], idents =
## idents, : All cells have the same value of Avp.

Emcn, Avp, Cd34, and Hlf showed little to no expression in any cluster.

Crhbp has mild expression in cluster 10, and Kit has mild expression in cluster 6.

Monocyte

Cd14 showed little to no expression

Macrophage

Lyz2 is widely expressed in all clusters

Erythroid

Epor, Klf1, and Tfr2 showed little to no expression in any clusters.

Csf2rb shows widespread expression, with highest expression in cluster 4. Gypa only shows expression in cluster 9

T-cell/NK

Cd3g shows expression in cluster 11, almost exclusively.

MEP

Gpr141 shows expression in multiple clusters. Expression of Gata1 is exclusive to cluster 4.

Lineage Specfic Markers cont.

These are markers that I found through literature search and though looking at pangloaddb (for example top markers for HSPCs)

Many of the markers are the same between my list and Dr. Sang’s list.

##    B.cell     MK   HSPC Monocyte Macrophage Erythroid T.cell.NK    MEP
## 1    Ighd Itga2b   Fgd5  Clec12a       Cd14      Klf1     Gata3 Tspan9
## 2    Sox4    Pf4  Mecom   Cxcl10       Ccr5     Tmod1     Tbx21 Treml1
## 3           Selp   Egr1     Psap        Cd5      Ank1      Rorc  Cd59a
## 4           Cd47  Ncor2   ifitm3     Slamf9     Alas2     Foxp3       
## 5          Gata2  Thsd1              Lilra5      Bpgm       Cd5       
## 6           Plk3 Nkx3-1                Mgl2      Rhag     Il2rb       
## 7          Runx1    Hlx               Ccl12     Grsf1    Zfp683       
## 8            Cfp  Ilr3a             Clec4a2                           
## 9         Tspan9   Cd33                                               
## 10        Treml1  Itga4                                               
## 11                Anpep                                               
##    ProliferationMarkers Myeloid  Eprog Plasma
## 1                   Mpo   Csf3r Hba-a1   Sik1
## 2                 Mki67   Elane Hbb-bt       
## 3                 Top2a     Mpo Hba-a2       
## 4                                 Snca       
## 5                                Cd59a       
## 6                                            
## 7                                            
## 8                                            
## 9                                            
## 10                                           
## 11

B-cells

Clusters 3 and 10 are high in B-cell markers which is consistent through all the stages

MKs

HSPCs

## Warning in SingleExIPlot(type = type, data = data[, x, drop = FALSE], idents =
## idents, : All cells have the same value of Nkx3-1.

## Warning in FeaturePlot(wbm, features = mrkrs2, ncol = 2): All cells have the
## same value (0) of Nkx3-1.

Monocytes

Macrophages

Erythroid

T-cell/NK

MEP

Proliferation

Myeloid

Erythroid Prog.

Plasma